Search CORE

41 research outputs found

Fast Fourier Transforms on Distributed Memory Parallel Machines

Author: Dubey Anshu
Publication venue: ODU Digital Commons
Publication date: 01/04/1993
Field of study

One issue which is central in developing a general purpose subroutine on a distributed memory parallel machine is the data distribution. It is possible that users would like to use the subroutine with different data distributions. Thus there is a need to design algorithms on distributed memory parallel machines which can support a variety of data distributions. In this dissertation we have addressed the problem of developing such algorithms to compute the Discrete Fourier Transform (DFT) of real and complex data. The implementations given in this dissertation work for a class of data distributions commonly encountered in scientific applications, known as the block scattered data distributions. The implementations are targeted at distributed memory parallel machines. We have also addressed the problem of rearranging the data after computing the FFT. For computing the DFT of complex data, we use a standard Radix-2 FFT algorithm which has been studied extensively in parallel environment. There are two ways of computing the DFT of real data that are known to be efficient in serial environments: namely (i) the real fast Fourier transform (RFFT) algorithm, and (ii) the fast Hartley transform (FHT) algorithm. However, in distributed memory environments they have excessive communication overhead. We restructure the RFFT and FHT algorithms to reduce this overhead. The restructured RFFT and FHT algorithms are then used in the generalized implementations which work for block scattered data distributions. Experimental results are given for the restructured RFFT and the FHT algorithms on two parallel machines; NCUBE-7 which is a Hypercube MIMD machine and AMT DAP-510 which is a Mesh SIMD machine. The performances of the FFT, RFFT and FHT algorithms with block scattered data distribution were evaluated on Intel iPSC/860, a Hypercube MIMD machine

Old Dominion University

Extensible Component Based Architecture for FLASH, A Massively Parallel, Multiphysics Simulation Code

Author: Andrew Siegel
Anshu Dubey
Antypas
Armstrong
Calder
Dan Sheeler
Dubey
Fisher
Fryxell
Gardiner
Hornung
Hornung
Katherine Riley
Katie Antypas
Klaus Weide
Lynn B. Reid
Murali K. Ganapathy
Oldham
O’Shea
Reynders
Toth
Publication venue: 'Elsevier BV'
Publication date: 24/07/2009
Field of study

FLASH is a publicly available high performance application code which has evolved into a modular, extensible software system from a collection of unconnected legacy codes. FLASH has been successful because its capabilities have been driven by the needs of scientific applications, without compromising maintainability, performance, and usability. In its newest incarnation, FLASH3 consists of inter-operable modules that can be combined to generate different applications. The FLASH architecture allows arbitrarily many alternative implementations of its components to co-exist and interchange with each other, resulting in greater flexibility. Further, a simple and elegant mechanism exists for customization of code functionality without the need to modify the core implementation of the source. A built-in unit test framework providing verifiability, combined with a rigorous software maintenance process, allow the code to operate simultaneously in the dual mode of production and development. In this paper we describe the FLASH3 architecture, with emphasis on solutions to the more challenging conflicts arising from solver complexity, portable performance requirements, and legacy codes. We also include results from user surveys conducted in 2005 and 2007, which highlight the success of the code.Comment: 33 pages, 7 figures; revised paper submitted to Parallel Computin

arXiv.org e-Print Archive

Crossref

Panel: Cultural Approaches to Improved Software Teams

Author: Dubey Anshu
Fadel Nur
Haupt Carina
Katz Daniel S.
Moulton David
Raybourn Elaine
Publication venue
Publication date: 22/07/2021
Field of study

Institute of Transport Research:Publications

Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4)

Author: Allen Alice
Bangerth Wolfgang
Chen Emily
Chue Hong Neil
Dubey Anshu
Geiger R. Stuart
Gesing Sandra
Hettrick Simon
Hwang Lorraine
Idaszak Ray
Katz Daniel S.
Lago Patricia
Miller Jonah
Niemeyer Kyle E.
Núñez-Corrales Santiago
Salac Jean
Publication venue: 'Ubiquity Press, Ltd.'
Publication date: 18/05/2017
Field of study

This report records and discusses the Fourth Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE4). The report includes a description of the keynote presentation of the workshop, the mission and vision statements that were drafted at the workshop and finalized shortly after it, a set of idea papers, position papers, experience papers, demos, and lightning talks, and a panel discussion. The main part of the report covers the set of working groups that formed during the meeting, and for each, discusses the participants, the objective and goal, and how the objective can be reached, along with contact information for readers who may want to join the group. Finally, we present results from a survey of the workshop attendees

arXiv.org e-Print Archive

Crossref

VU Research Portal

Directory of Open Access Journals

Edinburgh Research Explorer

Star Formation in the First Galaxies I: Collapse Delayed by Lyman-Werner Radiation

Author: Abel
Abel
Ahn
Ahn
Anshu Dubey
Bader
Barkana
Bate
Bate
Beers
Begelman
Bergin
Bertschinger
Bonazzola
Bonnor
Bromm
Bromm
Bromm
Bromm
Bromm
Bromm
Cen
Chalence Safranek-Shrader
Chandrasekhar
Cho
Christoph Federrath
Ciardi
Ciardi
Clark
Clark
Colella
Collins
Couchman
Dijkstra
Draine
Dubey
Dubey
Dubey
Ebert
Eisenstein
Elmegreen
Evans
Federrath
Federrath
Federrath
Federrath
Federrath
Ferrara
Flower
Frebel
Frisch
Fryxell
Fumagalli
Furlanetto
Galli
Gammie
Gardner
Girichidis
Glover
Glover
Glover
Glover
Goldreich
Greif
Greif
Greif
Greif
Greif
Greif
Haiman
Haiman
Haiman
Heger
Hennebelle
Hennebelle
Holzbauer
Hosokawa
Hummel
Inoue
Johnson
Johnson
Kainulainen
Klessen
Kolmogorov
Komatsu
Kratter
Kritsuk
Kritsuk
Kritsuk
Krumholz
Krumholz
Krumholz
Krumholz
Latif
Latif
Mac Low
Machacek
Maio
McGreer
McKee
McKee
Meghann Agarwal
Mesinger
Miloš Milosavljević
Molina
Nelson
O'Shea
O'Shea
O'Shea
Oh
Oh
Omukai
Omukai
Omukai
Osterbrock
Ostriker
Padoan
Padoan
Padoan
Padoan
Palla
Pan
Passot
Press
Price
Price
Prieto
Prieto
Prunet
Regan
Ricker
Ricotti
Rydberg
Safranek-Shrader
Scalo
Scannapieco
Schaerer
Schneider
Shang
Shapiro
Sheth
Solomon
Stacy
Stacy
Stahler
Stecher
Stiavelli
Tanaka
Tegmark
Toomre
Tornatore
Trenti
Trenti
Truelove
Turk
Vazquez-Semadeni
Volker Bromm
Wang
Wise
Wise
Wise
Wise
Wolcott-Green
Wolcott-Green
Yoshida
Yoshida
Zackrisson
Zuckerman
Publication venue: 'Wiley'
Publication date: 09/08/2012
Field of study

We investigate the process of metal-free star formation in the first galaxies with a high-resolution cosmological simulation. We consider the cosmologically motivated scenario in which a strong molecule-destroying Lyman-Werner (LW) background inhibits effective cooling in low-mass haloes, delaying star formation until the collapse or more massive haloes. Only when molecular hydrogen (H2) can self-shield from LW radiation, which requires a halo capable of cooling by atomic line emission, will star formation be possible. To follow the formation of multiple gravitationally bound objects, at high gas densities we introduce sink particles which accrete gas directly from the computational grid. We find that in a 1 Mpc^3 (comoving) box, runaway collapse first occurs in a 3x10^7 M_sun dark matter halo at z~12 assuming a background intensity of J21=100. Due to a runaway increase in the H2 abundance and cooling rate, a self-shielding, supersonically turbulent core develops abruptly with ~10^4 M_sun in cold gas available for star formation. We analyze the formation of this self-shielding core, the character of turbulence, and the prospects for star formation. Due to a lack of fragmentation on scales we resolve, we argue that LW-delayed metal-free star formation in atomic cooling haloes is very similar to star formation in primordial minihaloes, although in making this conclusion we ignore internal stellar feedback. Finally, we briefly discuss the detectability of metal-free stellar clusters with the James Webb Space Telescope.Comment: 22 pages, 1 new figure, accepted for publication in MNRA

arXiv.org e-Print Archive

Crossref

The Australian National University

Trends in Data Locality Abstractions for HPC Systems

Author: Amir Kamil
Anshu Dubey
Bradford L. Chamberlain
Chris J. Newburn
Didem Unat
Emmanuel Jeannot
Frank Hannig
H. Carter Edwards
Hal Finkel
Hatem Ltaief
Jeff Keasler
John Shalf
Karl Fuerlinger
Mark Abraham
Mauro Bianco
Miquel Pericas
Naoya Maruyama
Paul H J Kelly
Romain Cledat
Torsten Hoefler
Vitus Leung
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Recommended from our members

Modeling the Office of Science Ten Year Facilities Plan: The PERI Architecture Tiger Team

Author: Alam Sadaf
Bailey David H.
Carrington Laura
Daley Chris
de Supinski Bronis R.
Dubey Anshu
Gamblin Todd
Gunter Dan
Hovland Paul D.
Jagode Heike
Karavanic Karen
Marin Gabriel
Mellor-Crummey John
Moore Shirley
Norris Boyana
Oliker Leonid
Olschanowsky Catherine
Roth Philip C.
Schulz Martin
Shende Sameer
Snavely Allan
Spear Wyatt
Tikir Mustafa
Vetter Jeff
Worley Pat
Wright Nicholas
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 26/06/2009
Field of study

The Performance Engineering Institute (PERI) originally proposed a tiger team activity as a mechanism to target significant effort optimizing key Office of Science applications, a model that was successfully realized with the assistance of two JOULE metric teams. However, the Office of Science requested a new focus beginning in 2008: assistance in forming its ten year facilities plan. To meet this request, PERI formed the Architecture Tiger Team, which is modeling the performance of key science applications on future architectures, with S3D, FLASH and GTC chosen as the first application targets. In this activity, we have measured the performance of these applications on current systems in order to understand their baseline performance and to ensure that our modeling activity focuses on the right versions and inputs of the applications. We have applied a variety of modeling techniques to anticipate the performance of these applications on a range of anticipated systems. While our initial findings predict that Office of Science applications will continue to perform well on future machines from major hardware vendors, we have also encountered several areas in which we must extend our modeling techniques in order to fulfill our mission accurately and completely. In addition, we anticipate that models of a wider range of applications will reveal critical differences between expected future systems, thus providing guidance for future Office of Science procurement decisions, and will enable DOE applications to exploit machines in future facilities fully

UNT Digital Library

Programming Abstractions for Data Locality

The goal of the workshop and this report is to identify common themes and standardize concepts for locality-preserving abstractions for exascale programming models. Current software tools are built on the premise that computing is the most expensive component, we are rapidly moving to an era that computing is cheap and massively parallel while data movement dominates energy and performance costs. In order to respond to exascale systems (the next generation of high performance computing systems), the scientific computing community needs to refactor their applications to align with the emerging data-centric paradigm. Our applications must be evolved to express information about data locality. Unfortunately current programming environments offer few ways to do so. They ignore the incurred cost of communication and simply rely on the hardware cache coherency to virtualize data movement. With the increasing importance of task-level parallelism on future systems, task models have to support constructs that express data locality and affinity. At the system level, communication libraries implicitly assume all the processing elements are equidistant to each other. In order to take advantage of emerging technologies, application developers need a set of programming abstractions to describe data locality for the new computing ecosystem. The new programming paradigm should be more data centric and allow to describe how to decompose and how to layout data in the memory.Fortunately, there are many emerging concepts such as constructs for tiling, data layout, array views, task and thread affinity, and topology aware communication libraries for managing data locality. There is an opportunity to identify commonalities in strategy to enable us to combine the best of these concepts to develop a comprehensive approach to expressing and managing data locality on exascale programming systems. These programming model abstractions can expose crucial information about data locality to the compiler and runtime system to enable performance-portable code. The research question is to identify the right level of abstraction, which includes techniques that range from template libraries all the way to completely new languages to achieve this goal

INRIA a CCSD electronic archive server